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, From Wikipedia, the free encyclopedia 

Itanium is the brand name for 64-bit Intel Microprocessors that implement 
the Intel Itanium architecture (formerly called IA-64). Intel has released 
two processor families using the brand: Itanium and Itanium 2. The 
processors are marketed for use in enterprise servers and high-performance 
computing systems. The architecture originated at Hewlett-Packard (HP) 
and was later developed by HP and Intel together. 

Itanium's architecture differs dramatically from the x86 architectures (and 
the x86-64 extensions) used in other Intel processors. The architecture is 
based on explicit instruction- level parallelism, with the compiler making * 
the decisions about which instructions to execute in parallel This approach 
allows the processor to execute up to six instructions per clock cycle. By 
contrast with other superscalar architectures, Itanium does not need 
elaborate hardware to keep track of instruction dependencies during parallel 
execution. 

After a protracted development process, the first Itanium was released in 
2001, and subsequently more powerful Itanium processors have been 
released periodically. HP produces most Itanium-based systems, but several 
•other manufacturers have also developed systems based on Itanium. As of 
2007, Itanium is the fourth-most deployed microprocessor architecture for 
enterprise-class systems, behind x86-64, IBM POWER, and SPARC. After 

a schedule slip of several years J 1 ^ Intel released its newest Itanium 2, 
codenamed Montecito, in July 2006. 
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History 




Development: 1989-2001 

'n 1989, HP determined that reduced instruction 
jet computer (RISC) architectures were 
Approaching a processing limit at one instruction 
per cycle. HP researchers investigated a new 
architecture called Explicitly Parallel Instruction 
Computing (EPIC) that allows the processor to 
execute multiple instructions in one clock cycle. 
2PIC implements a form of very long instruction 
wrd (VLIW) architecture, where one instruction 
ryord contains multiple instr uctions. With EPIC, 
:he compiler determines in advance which 
.nstructions can be executed at the same time, so 
:he microprocessor simply executes the 
nstructions and does not need elaborate 
nechanisms to determine which instructions to 
execute in parallel.^ 

HIP determined that it was no longer cost-effective for individual enterprise systems companies such as itself to 
ieyelop proprietary microprocessors, so HP partnered with Intel in 1994 to develop the lA-64 architecture, which 
Jerived from EPIC. Intel was willing to undertake a very large development effort on IA-64 in the expectation that 
he resulting microprocessor would be used by the majority of the enterprise systems manufacturers. HP and Intel 
nitiated a large joint development effort with a goal of delivering the first product, codenamed Merced, in 1998. ^ 

During development, Intel, HP, and industry analysts were predicting that IA-64 would dominate in servers, 
workstations, and high-end desktops, and eventually supplant RISC and complex instruction set computer (CISC) 
irchitectures for all general-purpose applications. Several groups began to develop operating systems for the 
irchitecture, including Microsoft Windows variants, Linux variants, and UNIX variants, By 1997, it was apparent 
hat the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the 

ielivery of the Merced began slipping quarter by quarter, [5] Technical difficulties included the very high transistor 
counts needed to support the wide instruction words and the large caches. There were also structural problems within 
he project, as the two parts of the joint team used different methodologies and had slightly different priorities. Since 
vlerced was the first EPIC processor, the development effort encountered more unanticipated problems than the team 
vas accustomed to. In addition, the EPIC concept depends on compiler capabilities that had never been implemented 
before, so more unanticipated research was needed. 

ntel announced the official name of the processor, Itanium, on October 4, 1999. [63 Within hours observers referred to 
he processor as Itanicp^ a reference to Titanic, the n unsinkable M ocean liner which sank in 1912. Itanic has since 
)ften been used by The Register}^ Scott McNealy,^ and others. C 10 !! 11 ! It alludes to the perception that Itanium is a 
vhite elephant which cost Intel and HP many billions of dollars while failing to achieve expected performance and 
;a!es in the originally projected timeframe. Meanwhile, RISC and CISC architects were making steady improvements 
n superscalar implementations, allowing them to break the one-mstruction-per-clock barrier without using EPIC. 

Itanium processor: 2001-02 



By the time Itanium was released in June, 2001, it was 
no longer superior to contemporaneous RISC and 
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Original Itanium 
logo 



CISC processors. Itanium competed at the low-end 
(primarily 4-CPU and smaller systems) with servers 
based on x86 processors, and at the high end with 
IBM's POWER architecture and Sun Microsystems' 
SPARC architecture. Intel repositioned Itanium to 
focus on high-end business and HPC computing, 
attempting to duplicate x86's successful 
"horizontal" (i.e., single architecture, multiple systems 
vendors) market. Its success was limited to replacing 
PA-RISC and Alpha in HP systems and MIPS in SGI's 
HPC systems. POWER and SPARC remained strong, while the 32-bit x86 
architecture grew into the enterprise space. With economies of scale fueled 
by its enormous installed base, x86 was the preeminent "horizontal" 
architecture in enterprise computing. HP and Intel recognized that Itanium 
was not competitive and replaced it with Itanium 2 a year later, as they had 
planned. Only a few thousand of the original Itaniums were sold, due to 
limited availability caused by poor yields, relatively poor performance, and 
high cost. However, these machines were useful for software development 
for the Itanium 2 processors that followed. IBM delivered a supercomputer 
based on this processor^ 12 ] 

Itanium 2 processors: 2002-present 



Itanium processor 
Produced: From June 2001 to 

June 2002 
Manufacturer: Intel 

CPU speeds: 733 MHz to 

800 MHz 
FSB speeds: 266 MT/s to 
266 MT/s 
Itanium 



Instruction 
set: 

Socket: 
Core name: 



PAC418 
Merced 



The Itanium 2 was released in 2002, and was marketed for enterprise servers rather than for the whole gamut of high- 
end computing. The initial Itanium 2 was codenamed McKinley. McKinley used a 180 nm process, but it relieved 
many of the performance problems of the original Itanium, t 13 ^ 

In 2003, AMD released the Opteron, which implemented its X86-64 64-bit architecture. Opteron gained rapid 
acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by 
implementing x86-64 in its Xeon microprocessors in 2004.t 14 3 Intel released a new Itanium 2 family member, 
'codenamed Madison, in 2003, Madison used a 130 nm process and was the basis of all new Itaniums until Montecito 
was released in June 2006. 

In March, 2005, Intel announced that it was working on a new Itanium device, codenamed Tukwila, to be released in 
2007. Tukwila would have four processors and would replace the Itanium bus with a new Common System Interface, 
which would also be used by a new Xeon. [15J Intel later said that Tukwila would be delivered in late 2008. [16] 

In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to 
form the Itanium Solutions Alliance to promote the architecture and accelerate software porting. [l7 ^ The Alliance 
announced that its members would invest $10 Billion in Itanium solutions by the end of the decade J 1 ^ 



Architecture 



Intel has extensively documented the Itanium instruction set and microarchitecture, 

£ 19 1 and the technical press has provided overviews.! 1 ^ The architecture has been 
renamed several times during its history. HP called it EPIC and renamed it to PA- 
WideWord. Intel later called it IA-64, before settling on Intel Itanium Architecture, 
but it is still widely referred to as IA-64. It is a 64-bit register-rich explicitly-parallel 
architecture. The base data word is 64 bits, byte-addressable. The logical address 

space is 2 64 bytes. The architecture implements predication, speculation, and branch 
prediction. It uses a hardware register renaming mechanism rather than simple 
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register windowing for parameter passing. The same mechanism is also used to I 
permit parallel execution of loops. Speculation, prediction, predication, and j T he Intel Itaniu m architecture, j 

renaming are under control of the compiler; each instruction word includes extra 
bits for this. This approach is the distinguishing characteristic of the architecture. 

The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch 
registers. The floating point registers are 82 bits long to preserve precision for intermediate results. 

Instruction execution 

Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction 
words per clock from the LI cache into the pipeline. When the compiler can take maximum advantage of this, the 
processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven 
groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one 
instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets 
of the instruction set, common instructions can be executed in multiple units. The groups are: 

■ Six general-purpose ALUs, two integer units, one shift unit 

■ Four data cache units 

■ Six multimedia units, two parallel shift units, one parallel multiply, one population count 

■ two floating-point multiply-accumulate units, two "miscellaneous" floating-point units 
- ■ three branch units 

Thus, the compiler can often group instructions into sets of six that can execute at the same time. Since the floating- 
point units implement a multiply-accumulate operation, a single floating point instruction can perform the work of 
two instructions when the application requires a. multiply followed by an add: this is very common in scientific 
processing. When it occurs, the processor can execute four FLOPs per cycle. For example, the 800Mhz Itanium had a 
theoretical rating of 3.2 GFLOPS and the fastest Itanium 2, at 1 ,67Ghz, was rated at 6.67 GFLOPS. 

Memory architecture 

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KiB of Level 1 instruction 
cache and 16 KiB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KiB. The 
Level 3 cache was also unified and varied in size from 1.5 MiB to 24 MiB. The 256 Kib L2 cache contains sufficient 
logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU). 

Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley 
bus, but is now usually referred to by Intel's official name: the Scalability Port. The speed of the bus has increased 
steadily with new processor releases. The bus transfers 2x128 bits per clock cycle, so the 200 MHz McKinley bus 
' transferred 6.4 GB/s and the 533 MHz Montecito bus transfers 17.056 GB/s. 

Architectural changes 

Itaniums released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server 
applications, but performance was much worse in comparison with native instruction performance and 
contemporaneous x86 processors. In 2005 Intel developed a software emulator that provided better performance. 
With Montecito, Intel removed IA-32 support from the hardware. 

With Montecito, Intel made enhancements to the architecture in July 2006. [20] The architecture now includes 
hardware multithreading: each processor maintains context for two threads of execution. When one thread stalls due 
to a memory access the other thread gains control. Intel calls this "coarse multithreading 1 ' to distinguish it from 
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"hyper-threading technology" that was used in some x86 and x86-64 microprocessors. Coarse multithreading is well 
matched to the Intel Itanium Architecture and results in an appreciable performance gain. Intel also added hardware 
support for virtualization. Virtualization allows a software "hypervisor" to run multiple operating system instances c 
.the processor concurrently, Montecito also features a split L2 cache, adding a dedicated 1 MiB L2 cache for 
'instructions and converting the original 256 KiB L2 cache to a dedicated data cache. 

Hardware support 

Systems 

As' of 2007, several manufacturers offer Itanium 2 based systems, 
including HP, SGI, NEC, Fujitsu, Unisys, Hitachi, and Groupe Bull. 

In addition, Intel offers a chassis^ that can be used by system 
integrators to build Itanium systems. HP, the only one of the 
industry's top four server manufacturers to offer Itanium-based 
systems today, manufactures at least 80% of all Itanium 2 systems. 

HP sold 7200 systems in the first quarter of 2006. t 22 3 The bulk of 
the sales are of enterprise servers and machines for large-scale 
technical computing, with an average selling price per system in 
excess of US$200,000. A typical system uses eight or more Itanium 
processors. 



Chipsets 

The Itanium bus interfaces to the rest of the system via a chipset. 
Enterprise server manufacturers differentiate their systems by 
designing and developing chipsets that interface the processor to 
memory, interconnections, and peripheral controllers. The chipset is 
the heart of the system-level architecture for each system design. Development of a chipset costs tens of millions of 
. dollars and represents a major commitment to the use of the Itanium. Currently, modem chipsets for Itanium are 
manufactured by HP, Fujitsu, SGI, NEC, Hitachi, and Unisys. IBM created a chipset in 2003, and Intel in 2002, but 
neither of them has developed chipsets to support newer technologies such as DDR2 or PCI ExpressJ 2 ^ 

Software support 

In order to allow more software to run on the Itanium, Intel supported the development of effective compilers for its 
platform, especially its own suite of compilers. [24 K 25 ) GCC is also able to produce machine code for Itanium t 26 ^ 27 ) 
As of early 2007, Itanium is supported by Windows Server 2003, multiple distributions of Linux (including Debian 
Red Hat and Novell SuSE), and HP-UX, OpenVMS, and NonStop from HP, all natively. It also supports mainframe 
environment GCOS from Groupe Bull and several IA-32 operating systems via Instruction Set Simulators. Using 
QuickTransit, application binary software for IRIX/MIPS and Solaris/SPARC can run via "dynamic binary 
translation" on Linux/Itanium. According to the Itanium Solutions Alliance, as of early 2007 over 10,000 applications 
are available for Itanium based systems, [28] but Sun contests this number. ^ The ISA also supports Gelato, an 
Itanium HPC user group and developer community that ports and supports open source software for Itanium.t 30 ! 

Competition 

The Itanium 2 competes in the enterprise server- market. Itanium's major competitors include Sun Microsystems' 
UltraSPARC T2 and UltraSPARC IV+, IBM's POWER6, AMD's Opteron, and Intel's own Xeon servers. 



Server Manufacturers' Itanium Products 



Company 


latest product 
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to 
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HP 


2001 
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Integrity 
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2004 


2005 
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IBM 


2001 


2005 
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2002 
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SGI 
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Throughout its history, Itanium has had the best floating point performance relative to fixed-point performance of any 
general-purpose microprocessor. This capability is useful in HPC systems but is not needed for most enterprise server 
workloads. 

Supercomputers 

* One computer based on Itanium 2 appeared in the top 10 of the June 2007 list of the TOP500 supercomputers: HLRB- 
II 9 at position ten. [3 11 HLRB-II is operated by the Leibniz Computing Center. It is an SGI Altix 4700 cluster with 
9728 Itanium 2 (1.6 GHz) CPUs. Its maximum sustained processing capacity is 56.5 TeraflopsJ 32 ^ 

The best position ever achieved by an Itanium 2 based system in the list was #2, achieved in June 2004, when 
Thunder (LLNL) entered the list with an Rmax of 19,94 Teraflops, In November 2004, Columbia entered the list at 
#2 with 51.8 Teraflops. The peak number of Itanium-based machines on the list occurred in the November 2004 list 
at 16.8%; in June 2007, this was 5.6%, 

Processors 

Released processors 

The Itanium processors show a steady progression in capability, Merced was a proof of concept. McKinley 
dramatically improved the memory hierarchy and allowed Itanium to become reasonably competitive. Madison, with 
the shift to a 130 nm process, allowed for enough cache space to overcome the major performance bottlenecks. 
Montecito, with a 90 nm process, allowed for a dual-core implementation and a major improvement in performance 
per watt. 
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Future processors 

The future of the Itanium family apparently lies in multi-core chips, based on available information about coming 
generations. As of June 2007, some information is known for the following: 

- Montvale will be a revision of Montecito bringing slightly higher clock speeds (to 1 .66Ghz) and a faster FSB 
(to 667Mhz). The processor will implement a new power-saving system. Montvale will comprise a set of six 
variants called the Itanium 2 9100 series P™ 4 ) Release is expected in week 44 of 2007.P 5 ] The processors 
were originally expected to be released in June 2007, a year after Montecito. 

■ Tukwila, the first 65 nm design, is due in late 2008. W Tukwila will include four cores, large on-die caches 
Hyper-Threading technology and an integrated memory controller, and will implement double-device data ' 
correctron, which helps to fix memory errors. Tukwila will also implement Intel OuickPath Interconnect a new 
memory interface that replaces the Itanium bus. QuickPath will also be used on the Xeon Nehalem, so Tukwila 
can use the same chipsets as Nehalem. f 3 7 ^ 

- Poulson will use a 32 nm process and will feature four or more cores, multithreading enhancements, and new 
instructions to take advantage of parallelism, especially in virtualization. ^ 

■ For Kittson, few details are known other than the existence of the codename. i37] 

Timeline 

- 1989: 
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' - HP begins investigating EPIC^ 
- 1994: 

■ June: HP and Intel announce partnership ^ 
» 1995: 

■ September: HP, Novell, and SCO announce plans for a. "high volume UNIX operating system" to deliver 
"64-bit networked computing on the HP/Intel architecture" [39 1 

* 1997: 

■ June: IDC predicts IA-64 systems sales will reach $38bn/yr by 2001 ^ 

* October: Dell announces it will use IA-64^ 0i 

■ December: Intel and Sun announce joint effort to port Solaris to IA-64f 41 ^ 
« 1998: 

* March: SCO admits HP/SCO Unix alliance is now dead 

o June: IDC predicts IA-64 systems sales will reach $30bn/yr by 2001 ^ 

■ June: Intel announces Merced will be delayed, from second half of 1 999 to first half of 2000 [42] 

■ IBM announces it will build IA-64 machines^ 

* October: Project Monterey is formed to create a common UNIX for IA-64 
« 1999: 

■ February: Project Trillian is formed to port Linux to IA-64 

■ August: IDC predicts IA-64 systems sales will reach $25bn/yr by 2002 ^ 
m October: Intel Announces the Itanium name 

n October: the term Itanic is first used 

* 2000: 

■ February: Project Trillian delivers source code 

m June: IDC predicts Itanium systems sales will reach $25bn/yr by 2003 ^ 

* July: Sun and Intel drop Solaris-on-Itanium plans^ 

* August: AMD releases specification for x86-64, a set of 64-bit extensions to Intel's own x86 architecture 
intended to compete with IA-64. It will eventually market this under the name "AMD64" 

■ 2001: 

■ June: IDC predicts Itanium systems sales will reach $l5bn/yr by 2004^ 
m June: Project Monterey dies 

» July: Itanium is released 

m October: IDC predicts Itanium systems sales will reach $12bn/yr by the end of 2004 t2] 

« November: IBM's 320-processor Titan NOW Cluster at National Center for Supercomputing 

Applications is listed on the TOP500 list at position #34 [l2] 
n December: Gelato is formed 

■ 2002: 

■ March: IDC predicts Itanium systems sales will reach $5bn/yr by end 2004^ 
m June:Itanium 2 is released 

* 2003: 

■ April: IDC predicts Itanium systems sales will reach $9bn/yr by end 2007^ 
a April: AMD releases Opteron, the first processor with x86-64 extensions 

■ June: Intel releases the "Madison" Itanium 2 
- 2004: 

■ February: Intel announces it has been working on its own x86-64 implementation (which it will 
eventually market under the name "Intel 64") 

« June: Intel releases its first processor with x86-64 extensions, a Xeon processor codenamed "Nocona" 
m June: Thunder, a system at LLNL with 4096 Itanium 2 processors, is listed on the TOP500 list at position 

.m November: Columbia, an SGI Altix 3700 with 1 0160 Itanium 2 processors at NASA Ames Research 
Center, is listed on the TOP500 list at position #2, [46] 
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December; Itanium system sales for 2004 reach $1.4bn 

. ■ 2005: 

b January: HP ports OpenVMS to Itanium [47J 

• February: IBM server design drops Itanium support^ 48 ^ 23 ^ 

m June: An Itanium 2 sets a record SPECfp2000 result of 2,801 f 49 ^ in a Hitachi, Ltd. Computing blade. 
- September: Itanium Solutions Alliance is formed^ 

■ September: Dell exits the Itanium business^ 51 ! 

■ October: Itanium server sales reach $619M/quarter in the third quarter. 

a October: Intel announces one-year delays for Montecito, Montvale, and Tukwila^ 16 ^ 
- 2006: 

■ January; Itanium Solutions Alliance announces a $10bn collective investment in Itanium by 2010 

■ February: IDC predicts Itanium systems sales will reach $6.6bn/yr by 2009 [3][52 ^ 53 ^ 

■ June: Intel releases the dual-core "Montecito" Itanium 2^ 54 J 
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